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ABSTRACT 


This is a three-level package designed to allow statistical 
analysis for a variety of applications within the USL DIMS 
NASA/ RECON project. Designed with flexibility and uniformity as 
the main considerations, it is expected to provide computational 
capabilities for a variety of user needs, beginner to expert, in 
three different forms: a library package, an interactive package 
and a batch-processing package. 
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FOR THE DEVELOPMENT 
OF A USL NASA PC R&D 
STATISTICAL ANALYSIS SUPPORT PACKAGE 


I . INTRODUCTION 


s is a pr opo sal 

for the 

design, devel opment 

and 

tation of a genera 

1 -purpo s e 

statistical package for 

the 

NASA/ RECON project . 



- 

tistical Packages 

offer to 

the user the power 

and 


ity they need, without having to write complicated 
. In addition, the user can be assured of the accuracy of 
Its. Many statistical packages have been developed so 
all types and sizes of computers. 

re are three major types of statistical packages 

e for the user: 

Statistical Program Libraries. 

Statistical Libraries are collections of programs that 
are bound together in one collection. The user can call 
them from his/her application programs, supplying the 
appropriate arguments and obtaining the results in a 
s imi 1 ar way . 
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ii. Interactive Statistical Packages. 

Interactive Statistical Packages all o w the user to 
interact with the computer. The user is put on at a 
"command level”, where he/she issues comnands and 
enters data. The program processes the data and returns 
the results to the user on the terminal screen. 

iii. Batch Statistical Programs. 

Batch programs allow the user to collect all his/her 
comnands to the program in one group, code them in a 
particular language, and then process the entire batch. 
The user does not interact with the execution at all. 

This research and development proposal intends to implement 
all three packages under a unified interface. The result is 
expected to be a flexible and powerful package with comnon 
characteristics between its three forms. It is also intented to 
be completely transportable among any computer that can support 


the ”C” programning 

language. This includes 3 

of 

the 

4 

large 

computer systems 

available at USL, namely 

the 

DEC 

UNIX 

VAX-1 1/780, DEC VMS 

VAX- 11/780, the Pyramid 

Techno 1 og i e s 

90x , 

and, of course, the 

IBM PC/XT. 
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II. OBJECTIVES OF THE PROJECT 

The generic objectives of the project are as follows: 

i. To develop a powerful, flexible, easy-to-use and 
transportable statistical package. 

ii. To improve our knowledge in the fields of statistics and 
numerical computation. 

iii. To obtain further experience on the design, implementation, 
testing and maintenance of a major software product. 

The specific objectives of the statistical package design are as 

foil ows : 

i. Computational power: the objective is a design that can 
satisfy most user needs in terms of available functions and 
options. The package should offer a full range of conxnands 
for most applied statistical computations. 

ii. Design flexibility: The design must be flexible so that 
changes, improvments and addition of more functions can be 
accommodated without major changes, if any, to the entire 
package . 

iii. Ease of use: the design should be such that any of the three 
interfaces will be easy to use efficiently. This includes 
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error checking and, in the case of the interactive user 
interfaces, available online help. Uniform conmand and 
function formats in all three modes will be used also. 

iv. Efficiency and accuracy: Efficiency of the algorithms 

used is very important considering that the package to be 
developed will be used in mini and micro computers with 
often limited resources and speed of execution problems. 
Accuracy is also critical so that the user is assured of the 
quality of the results. 

v. Package Transportability: The programs should be written in 

a way that ensures transportability between varying 

operating environments. Standard programming policy will be 
adopted for all modules. 

This design will be first implemented on the IBM Personal 

Computers of the NASA/RECON Project. Parallel development on the 

DEC VAX-11/780 will also be considered. 
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III. METHODOLOGY 

A highly modular approach will be followed in the design and 
implementation of this project. This will ensure that many of the 
design objectives, in particular, flexibility and modifiability, 
are inherent in the implementation. 

For achieving the transportability and modularity goals, the 
"C" programming language was chosen as the implementation 
language. It offers good performance characteristics and high 
mo d u 1 a r i t y wh i c h ma k e it mo st desirable. It is also p owe r f u 1 in 
character and file manipulation, facts that make it more 
desirable to use in the second and third phase of the design. 

Algorithm selection is critical in the computational parts. 
Therefore, extensive research will have to be performed in order 
to determine the most appropriate ones to be used. While a 
computer approach to manual algorithms can be used, for some 
cases it is not efficient and better methods should be found. 

At this point only the first phase of the statistical 
package has been totally defined. The interactive and batch 
interfaces will be designed under the considerations applied to 
the first phase, with the final goal being to create a common, 
efficient interface for all three modes. A defined command 
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language for the interactive phase would consist of either a 
menu- s e 1 e c t i on procedure, a comnand language or a combination. 
Again, the batch interface can be similar to the interactive in 
terms of comnand names, arguments, etc, or be a completely 
different programming language by itself. 

For the interactive interface, a spreadsheet configuration 
like MINITAB is likely to be implemented, with its conxnands, 
arguments and options combined to form the batch progranming 
language. Therefore, by making the library interface with a 
similar structure, the goal of uniformity can be achieved. 

As a minimum, the functions shown below are expected to be 
implemented for the first phase (program library). Then the 
interactive and batch interfaces can be built on top of the 
packages. The modularity of the design will allow the addition 
of new functions and/or the modification of existing ones to be 
performed efficiently, with no major code changes in the entire 
program structure. 
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PROPOSED CONFIGURATION 


OF PHASE 1 


1. Basic Input/Output 

Read f r om a given file 

Read from terminal 

Write to a given file 

Write to terminal 

Report error/warning messages 

2. Basic One-Vector Calculations 

SumX, SumX2, sum2X 
Mean, mode, median 
Variance, standard deviation 
Sort ascending, descending, rank 
Frequency, most /least frequent 
Relative frequencies, signs 
Max-min, local max/min, k-th max/min 

3. One-Vector Test Statistics 

Confidence intervals 
z-scores, z-tests 
Proportion tests 
Student’s t-test 
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Sma 11/large s amp 1 e sizes 

4. Basic one-Vector Graphs 

Bar Charts - Histograms 
Frequency graphs 

5. Two-Way Statistics 

Hypothesis testing 

Difference of means 
Variance known/unknown 
D- test 

Paired Samples Tests 
Tests for Standard Deviation 
Degrees of Freedom, F-test 
Tests for proportions 

6. Two-Vector Graphics 

Plots 

Scatter Grams 
Frequency Plots 
Cha r t s 
X-Y plots 
Distributions 
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7. Linear Regression and Correlation 

Linear Regression Analysis 
Regression Line calculations 
Correlation Analysis 
Standard Error 

8. Multiple Regression 

Exponential regression analysis 
Logarithmic regression analysis 
Parabolic regression analysis 
Multiple Analysis 

9. Basic Probability Calculations 

Probability distributions 
Normal distribution 
Binomial distribution 
Poisson distribution 
Probability tests 

p-test Probability confidence intervals 

10. Advanced Probability Calculations 

Conditional Probabilities 
Independent Probabilities 
Probability Estimations 
Bayes* Theorem 
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13. 


Basic Combinatoric Calculations 
P(m,n) value 
C(m,n) value 
n ! value 

Probability distributions of s amp 1 e s 

Chi Square Analysis 

Contingency Tables 
Chi Square tests 
Chi Square distribution 
Lambda Index of association 

Analysis of Variance 
One-way analysis 
Two-way analysis 
Difference of several means 
Total Variance calculations 

Non-Pa rame t r i c Tests 
Sign test 
Mann-Whitney test 
Non-parame t r i c ANOVA 
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IV. SUMMARY 


Design, implementation, testing and maintenance of this 
major software package is expected to generate a support 
environment for any other activities that require statistical 
analysis within the NASA/RECON or related DEMS projects. The 
applicability of statistical analysis methods in information 
storage and retrieval systems is increasing, ranging from 
performance measurement and evaluation to natural language text 
analysis, thus making this project an interesting consideration 
for further research and development. 

The unified environment that this document proposes is 
expected to further improve the user/system interface and make it 
more effective. Portability is also provided in order to have a 
single data analysis environment for more than one hardware 
conf igurat ion. 
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